MetaClustering: Discovery of The Di erent Sample Clusterings in Gene Expression Data
نویسندگان
چکیده
Clustering of the samples is a standard procedure for the analysis of gene expression data, for instance to discover cancer subtypes. However, more than one biologically meaningful clustering can exist, depending on the genes chosen. We propose here to group the genes in function of the clustering of the samples they t. This allows to determine directly the di erent clusterings of the samples present in the data. As a clustering is a structure, genes belonging to the same group are functions of the same structure. Hence, the determination of groups of genes which support the same clustering could also be viewed as the detection of non-linearly linked genes. MetaClustering was applied successfully to simulated data. It also recovered the known clustering of real cancer data, which was impossible using the complete set of genes. Finally, it clustered together cell-cycle genes, showing its ability to group genes related in a non-linear way.
منابع مشابه
MetaClustering: discovery of the different sample clusterings in gene expression data.
Clustering of the samples is a standard procedure for the analysis of gene expression data, for instance to discover cancer subtypes. However, more than one biologically meaningful clustering can exist, depending on the genes chosen. We propose here to group the genes in function of the clustering of the samples they fit. This allows to determine directly the different clusterings of the sample...
متن کاملVisualized Classification of Multiple Sample Types
The goal of the knowledge discovery and data mining is to extract the useful knowledge from the given data. Visualization enables us to nd structures, features, patterns, and relationships in a dataset by presenting the data in various graphical forms with possible interactions. Recently, DNA microarray technology provides a board snapshot of the state of the cell by measuring the expression le...
متن کاملIdentification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis
Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...
متن کاملSupplementary information to “Clustering and Metaclustering with Nonnegative Matrix Decompositions” Metaclustering gene expression data: the Meyerson lung cancer dataset
In the following we show that metaclustering is successful at biclustering a large lung cancer dataset from the Meyerson lab [1]. Using HG-U95Av2 Affymetrix oligonucleotide microarrays, Bhattacharjee et al. [1] have measured mRNA expression levels of 12600 transcript sequences (genes) in 186 lung tumor samples (139 adenocarcinomas, 21 squamous cell lung carcinomas, 6 small cell lung cancers, 20...
متن کاملComplete Hierarchical Cut-Clustering: An Analysis of Guarantee and Quality
There are many algorithms for dividing a graph into parts, so-called clusters. An essential question is how dense these clusters are. This can be measured by the intra-cluster expansion. The cut-clustering algorithm as presented by Flake et al. [FTT04] provides a theoretical guarantee on the intra-cluster expansion, which for example greedy clustering approaches can not give, as calculating the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007